Extracting word lists for domain-specific implicit opinions from corpora

نویسندگان

  • Nuria Bertomeu
  • Manfred Stede
چکیده

Sentiment analysis relies to a large extent on lexical resources. While lists of words bearing a contextindependent evaluative polarity (‘great’, ‘bad’) are available for many languages now, the automatic extraction of domain-specific evaluative vocabulary still needs attention. This holds especially for implicit opinions or so-called polar facts. In our work, we focus on German and on a genre that has not received much attention yet: customer emails. As the prime downstream application is identifying customers’ complaints, we concentrate here on finding negative words, but our method applies to positive ones as well. Using a seed list approach, we provide a comparative analysis along three dimensions: effect of different seed lists, different linguistic analysis units, and different statistical correlation tests.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Vocabulary Lists for EAP and Conversation Students

Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...

متن کامل

Conceptual Structure of Automatically Extracted Multi-Word Terms from Domain Specific Corpora: a Case Study for Italian

This paper is based on our efforts on automatic multi-word terms extraction and its conceptual structure for multiple languages. At present, we mainly focus on English and the major Romance languages such as French, Spanish, Portuguese, and Italian. This paper is a case study for Italian language. We present how to build automatically conceptual structure of automatically extracted multi-word t...

متن کامل

Experimenting with Extracting Lexical Dictionaries from Comparable Corpora for English-Romanian language pair

The paper describes a tool developed in the context of the ACCURAT project (Analysis and evaluation of Comparable Corpora for Under Resourced Areas of machine Translation). The purpose of the tool is to extract bilingual lexical dictionaries (word-to-word) from comparable corpora which do not have to be aligned at any level (document, paragraph, etc.) The method implemented in this tool is intr...

متن کامل

A Corpus-based Machine Translation Method of Term Extraction in LSP Texts

To tackle the problems of term extraction in language specific field, this paper proposes a method of coordinating use of corpus and machine translation system in extracting terms in LSP text. A comparable corpus built for this research contains 167 English texts and 229 Chinese texts with around 600,000 English tokens and 900,000 Chinese characters. The corpus is annotated with mega-informatio...

متن کامل

Extracting Explicit and Implicit Causal Relations from Sparse, Domain-Specific Texts

Various supervised algorithms for mining causal relations from large corpora exist. These algorithms have focused on relations explicitly expressed with causal verbs, e.g. “to cause”. However, the challenges of extracting causal relations from domain-specific texts have been overlooked. Domain-specific texts are rife with causal relations that are implicitly expressed using verbal and non-verba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017